When I saw this mentioned on HN I immediately knew this kind of comment would be there because something along the lines and I am paraphrasing "its probably fast because its not enterprise enough" was repeated in every place the refterm was shared by different people even multiple times even after showing all the proof in the world that its in fact the opposite they almost refused to believe that software can be that much better than its standard today even to the point of bringing up arguments like 16fps terminal is better than 7500 one because so many fps would probably consume to much resources.
Before I found Casey's tone criticizing bad software off putting but now I understand that after many years of such arguments it takes toll on you.
Seconding. It takes doing some low-level gamedev[0] stuff, or using software written by people like Casey, to realize just how fast software can be. There's an art to it, and it hits diminishing returns with complex software, but the baseline of popular software is so low it doesn't take much more than a pinch of care to beat it by an order of magnitude.
(Cue in the "but why bother, my app is IO bound anyway" counterarguments. That happens, but people too often forget you can design software to avoid or minimize the time spent waiting on IO. And I don't mean just "use async" - I mean design its whole structure and UX alike to manage waits better.)
--
[0] - Or HFT, or browser engine development, few other performance-mindful areas of software.
I feel obliged to point out the destructive power of Knuth's statement, "Premature optimization is the root of all evil."
I have encountered far too many people who interpret that to mean, "thou shall not even even consider performance until a user, PM or executive complains about it."
The irony is that the very paragraph in which Knuth made that statement (and the paper, and Knuth's programming style in general) is very much pro-optimization. He used that statement in the sense of "Sure, I agree with those who say that blind optimization everywhere is bad, but where it matters…".
Here's the quote in context:
> There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.…
And from the same paper, an explicit statement of his attitude:
> The improvement in speed from Example 2 to Example 2a [which introduces a couple of goto statements] is only about 12%, and many people would pronounce that insignificant. The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by pennywise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn't bother making such optimizations on a one-shot job, but when it's a question of preparing quality programs, …
Knuth's statement was basically "use a profiler and optimize the hot path instead of trying to optimize with your intuition", which is great advice. Most people heard "don't optimize at all". Something that you can derive from that advice is "have a hot path to optimize". I've seen a few programs that aren't trivial to optimize because the work is distributed everywhere.
I'm what most people call a FPGA Engineer though I work all the way from boards/silicon to cloud applications. The number of times I've been asked to consult on something in the software world on performance, where the answer to how to do it write was me telling them "rm -rf $PROBLEMATIC_CODE" and then go rewrite it with a good algorithm, is way too damn high. Also, the number of times someone asked me to accelerate something on an FPGA only for me to go implement it run on a GPU in about 2-3 days using SYCL + OpenCL is insane. Sure, I could get another 2x improvement... or we can accept the 1,000x improvement I just gave you at a much lower price.
Which of course, never really happens, because PM’s and execs always want more features, and performance is never a feature for them until it becomes so noticeably bad that they begrudgingly admit they should do the minimum to make users stop complaining.
Agreed, as a young performance oriented coder I've been often looked down by people who used Knuth almost god-like authority to dress up all sorts of awful engineering.
And of course most people don't know the full quote and they don't care about what Knuth really meant at the time.
>> I've been often looked down by people who used Knuth almost god-like authority to dress up all sorts of awful engineering.
Quick quips don't get to trump awful engineering. Just say call Knuth a boomer and point to the awful aspects of actual code. No disrespect to Knuth, just dismiss him as easily as people use him to dismiss real problems.
> I feel obliged to point out the destructive power of Knuth's statement, "Premature optimization is the root of all evil."
Except that line was written in a book (Volume 1: Art of Computer Programming) that was entirely written from the ground up in Assembly language.
Its been a while since I read the quote in context. But IIRC: it was the question about saving 1 instruction between a for-loop that counts up vs a for-loop that counts down.
"Premature optimization is the root of all evil" if you're deciding to save 1-instruction to count from "last-number to 0", taking advantage of the jnz instruction common in assembly languages, rather than "0 to last-number". (that is: for(int i=0; i<size; i++) vs for(int i=size-1; i>=0; i--). The latter is slightly more efficient).
Especially because "last-number to 0" is a bit more difficult to think about and prove correct in a number of cases.
I'm not an experienced programmer but if I took all these maxims seriously...
If I don't think about performance and other critical things before committing to a design I know that in the end I will have to rewrite everything or deliver awful software. Being lazy and perfectionist those are two things I really want to avoid.
I find it striking that a decent modern laptop would have been a supercomputer 20 years ago, when people used Office 97 that was feature complete already IMO. I can't help this constant cognitive dissonance with modern software; do we really need supercomputers to move Windows out of the box?
We need some extra processing power to support larger screens and refresh rates. Arguably, security benefits of managed code / sandboxing are worth it - but the runtimes seem to be pretty-well optimized. Other than that, I don't see anything reasonable to justify the low performance of most software.
Uh, yah 4k, etc but most of the modern machines are still 1920x1080@60hz. Which is only 8% larger than 1600x1200 which wasn't an uncommon resolution in the late 1990's, usually running at 75Hz or better over analog vga cables. So its actually _LESS_ bandwidth, which is why many of us cried about the decade+ of regression in resolution/refresh brought on by the LCD manufactures deciding computer monitors weren't worthy of being anything but overpriced TV screens. Its still ongoing, but at least there are some alternatives now.
It is possible to get office97 (or for that matter 2003, which is one of the last non sucky versions) and run it on a modern machine. It does basically everything instantly, including starting. So I don't really think resolution is the problem.
PS, I've had multiple monitors since the late 80's too, in various forms, in the late 1990's driving multiple large CRTs at high resolution from secondary PCI graphics cards, until they started coming with multiple ports (thanks matrox!) for reasonable prices.
I'd imagine software is bloated and grown until is it still-just-about usable on modern hardware. Making it faster there is probably seen as premature optimisation.
I'd imagine perhaps this is how product teams are assessed - is the component just-about fast enough, and does it have lots of features. So long as MS Office is the most feature-rich software package, enterprise will buy nothing else, slow or not.
It doesn't even need to be the most feature-rich any more. Microsoft has figured out that the key is corporate licensing.
Is Teams better than Zoom? No, but my last employer ditched Zoom because Teams was already included in their enterprise license package and they didn't want to pay twice for the same functionality.
I think there's a story in here that most are missing, but your comment is closest to touching on. This was not a performance problem. This was a fundamental problem that surfaced as a performance issue.
The tech stack at use in the Windows Terminal project is new code bolted onto old code, and no one on the existing team knows how that old code works. No one understands what it's doing. No one knows when the things that old code needed to do was still needed.
It took someone like Casey who knew gamedev to know instinctually that all of that stuff was junk and you could rewrite it in a weekend. The Microsoft devs, if they wanted to dive into the issue, would be forced to Chesteron's Fence every single line of code. It WOULD have taken them years.
We've always recommended that programmers know the code one and possibly two layers below them. This recommendation failed here, it failed during the GTA loading times scandal. It has failed millions of times and the ramifications of that failing is causing chaos of performance issues.
I'm come to realize that much of the problems that we have gotten ourselves into is based on what I call feedback bandwidth. If you are an expert, as Casey is, you have infinite bandwidth, and you are only limited by your ability to experiment. If your ability to change is a couple seconds, you will be able to create projects that are fundamentally impossible without that feedback.
If you need to discuss something with someone else, that bandwidth drops like a stone. If you need a team of experts, all IMing each-other 1 on 1, you might as well give up. 2 week Agile sprints are much better than months to years long waterfall, but we still have so much to learn. If you only know if the sprint is a success after everyone comes together, you are doomed. The people iterating dozens of times every hour will eat your shorts.
I'm not saying that only a single developer should work on entire projects. But what I am saying is that when you have a Quarterback and Wide Receiver that are on the same page, talking at the same abstraction level, sometimes all it takes is one turn, one bit of information, to know exactly what the other is about to do. They can react together.
Simple is not easy. Matching essential complexity might very well be impossible. Communication will never be perfect. But you have to give it a shot.
Thanks for the “know the code one and possibly two layers below them” point, haven’t seen it written out explicitly before, but it sure puts into perspective why I consider some people much better programmers than others!
I started off programming doing web development working on an community run asynchronous game where we needed to optimize everything to run in minimal time and power to save on cost and annoyance. It was a great project to work on as a high schooler.
Then in college, I studied ECE and worked in a physics lab where everything needed to run fast enough to read out ADCs as quickly as allowed by the datasheet.
Then I moved to defense doing FPGA and SW work (and I moonlighted in another group consulting internally on verifcation for ASICs). Again, everything was tightly controlled. On a PCI-e transfer, we were allowed 5 us of maximum overhead. The rest of the time could only be used for streaming data to and from a device. So if you needed to do computation with the data, you needed to do it in flight and every data structure had to be perfectly optimized. Weirdly, once you have data structures that are optimized for your hardware, the algorithms kind of just fall into place. After that, I worked on sensor fusion and video applications for about a year where our data rates for a single card were measured in TB/s. Needless to say, efficiency was the name of the game.
After that, I moved to HFT. And weirdly, outside of the critical tick-to-trade path or microwave stuff, this industry has a lot less care around tight latencies and has crazy low data rates compared to what I'm used to working with.
So when I go look at software and stuff is slow, I'm just suffering because I know all of this can be done faster and more efficiently (I once shaved 99.5% of the run time off of a person's code with better data packing to align to cache lines, better addressing to minimize page thrashing, and automated loop unrolling into threads all in about 1 day of work). Software developers seriously need to learn to optimize proactively... or just write less shitty code to begin with.
while that's true, in this particular case with Casey's impl, it's not an art. The one thing that drastically improved performance, was caching. Literally the simplest, most obvious thing to do when you have performance problems.
The hiring machine merely ensures that they can leetcode their way out of an interview and into the job. It doesn't care about what they're supposed to know :)
Even something like a JSON parser is often claimed to be IO bound. It almost never is because few could keep up with modern IO and some cannot keep up with old HD’s
Third'ing: the current crop of new developers have no freaking idea how much power they have at their finger tips. Us greybeard old game developers look at today's hardware and literally cream our jeans in comparison to the crap we put up with in the previous decades. People have no idea, playing with their crappy interpenetrated languages, just how much raw power we have if one is willing to learn the low level languages to access them. (Granted, Numpy and BLAS to a wonderful job for JIT languages.)
I'd say it is almost the other way around. We have so much wonderful CPU power that we can spare some for the amazing flexibility of Python etc.
Also it's not that simple. One place I worked at (scientific computation-kind of place), we'd prototype everything in Python, and production would be carefully rewritten C++. Standards were very high for prod, and we struggled to hire "good enough", modern C++, endless debates about "ooh template meta-programming or struct or bare metal pointers" kind of stuff.
3 times out of 4, the Python prototype was faster than the subsequent C++. It was because it had to be, the prototype was ran and re-ran and tweaked many times in the course of development. C++ was written once, deployed, and churned daily, without anyone caring for its speed.
Python has nice optimised libs for that, so it's not completely a surprise for that kind of application.
If you're doing generic symbol shuffling with a bit of math, Python is fast-ish to code and horribly slow to run. You can easily waste a lot of performance - and possibly cash - trying to use it for production.
Whether or not you'll save budget by writing your own optimised fast libs in some other lang is a different issue, and very application dependent.
Worth bearing in mind that Casey has a long history of unsuccessfully trying to nudge Microsoft to care about performance and the fact that he's still being constructive about it is to his credit.
I highly respect Casey but given his abrasive communication style I sometimes wonder if he is not trying to trigger people (MS devs in this case) to push him back so he can make his point.
Honestly, I felt like the ones to start with the condescending tones were the Microsoft devs who kept talking down to Casey about You Don't Understand How Hard This Is, when they also readily admitted they didn't understand the internals very well.
I don't think they're actually contradicting themselves there. They know enough about how hard text rendering is to conclude that they're better off delegating it to the team that specializes in that particular area, even though it means they have to settle for a good-enough abstraction rather than winning at a benchmark.
Enterprise deployment of a Somebody Else’s Problem field can really harm innovation,
“Any object around which an S.E.P. is applied will cease to be noticed, because any problems one may have understanding it (and therefore accepting its existence) become Somebody Else's Problem.”
Agree. Rendering text well really is hard, if you sit down and try to do it from scratch. It’s just that dealing with all of the wonderful quirks of human languages doesn’t have to make it _slow_. That’s their mistake.
And you’re right; all refterm really does is move the glyph cache out to the GPU rather than copying pixels from a glyph cache in main memory every frame.